Estimating the Accuracy of Automated Classification Systems Using Only Expert Ratings that are Less Accurate than the System

نویسنده

  • Paul E. Lehner
چکیده

A method is presented to estimate the accuracy of an automated classification system based only on expert ratings on test cases, where the system may be substantially more accurate than the raters. In this method an estimate of overall rater accuracy is derived from the level of inter-rater agreement, Bayesian updating based on estimated rater accuracy is applied to estimate a ground truth probability for each classification on each test case, and then overall system accuracy is estimated by comparing the relative frequency that the system agrees with the most probable classification at different probability levels. A simulation analysis provides evidence that the method yields reasonable estimates of system accuracy under diverse and predictable conditions. Introduction Information technology is advancing to develop systems that address problems of increasing sophistication and complexity. A quick scan of programs sponsored by showed new systems being developed to address complex problems as diverse as automated medical and clinical diagnoses, technology readiness evaluation, detection of emerging technologies, classification of the behavioral contents of unstructured video segments, recognition and classification of metaphors used in natural language text and many others. The complexities of the problems that these advanced systems address make it difficult to evaluate the accuracy of such systems. It is usually necessary to PAUL E. LEHNER 123 resort to using expert raters to assign ground truth for test cases. However, the complexity of these problems also challenge to the expert raters. Raters often disagree as to which is the correct category. Furthermore as future systems address problems of ever increasing sophistication and complexity, it seems likely that the experts will be even more challenged and exhibit even lower levels of agreement. Ground truth data sets based on expert assignments are fallible and are likely to become more so in the future. Using expert raters to assign ground truth to test cases is a well-established practice. For classification problems, which are the focus of this paper, a statistic such as Kappa is used to measure inter-rater agreement; and then the rating process is refined until a satisfactory level of agreement is reached. Once the agreement threshold is reached, assignments of individual raters or collaborating teams of raters are treated as truth and system accuracy is measured by the level of agreement with the assigned ground truth (See Gwet, 2010 for review). For several reasons, this common scientific practice does not adequately meet the needs of …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Testing the accuracy of automated classification systems using only expert ratings that are less accurate than the system

A method is presented to estimate the accuracy of automated classification systems using only expert ratings that may be substantially less accurate than the systems being evaluated. The estimation method begins with multiple expert ratings on test cases, uses the level of inter-rater agreement to estimate rater accuracy, uses Bayesian updating based on estimated rater accuracy to estimate a “g...

متن کامل

Retrieval–travel-time model for free-fall-flow-rack automated storage and retrieval system

Automated storage and retrieval systems (AS/RSs) are material handling systems that are frequently used in manufacturing and distribution centers. The modelling of the retrieval–travel time of an AS/RS (expected product delivery time) is practically important, because it allows us to evaluate and improve the system throughput. The free-fall-flow-rack AS/RS has emerged as a new technology for dr...

متن کامل

Feasibility Study of Real-time and Automated Monitoring of Iranian Rivers using 50-kHz Fluvial Acoustic Tomography System

Acoustic Tomography (AT) technique is an innovative method for real-time river monitoring. In this study, not only the accuracy of flow velocity measurement using 50 kHz AT system which is appropriate for narrow rivers (most Iranian rivers) is evaluated, but also its performance is compared with 30 kHz one which is used in wide rivers. The comparison results showed that the velocity resolutions...

متن کامل

Improvement in Differential GPS Accuracy using Kalman Filter

Global Positioning System (GPS) is proven to be an accurate positioning sensor. However, there are several sources of errors such as ionosphere and troposphere effects, satellite time errors, errors of orbit data, receivers errors, and errors resulting from multi-path effect which reduce the accuracy of low-cost GPS receivers. These sources of errors also limit the use of single-frequency GPS r...

متن کامل

Automatic road crack detection and classification using image processing techniques, machine learning and integrated models in urban areas: A novel image binarization technique

The quality of the road pavement has always been one of the major concerns for governments around the world. Cracks in the asphalt are one of the most common road tensions that generally threaten the safety of roads and highways. In recent years, automated inspection methods such as image and video processing have been considered due to the high cost and error of manual metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015